Entity Resolution and Federated Learning get a Federated Resolution

نویسندگان

  • Richard Nock
  • Stephen Hardy
  • Wilko Henecka
  • Hamish Ivey-Law
  • Giorgio Patrini
  • Guillaume Smith
  • Brian Thorne
چکیده

Consider two data providers, each maintaining records of different feature sets about common entities. They aim to learn a linear model over the whole set of features. This problem of federated learning over vertically partitioned data includes a crucial upstream issue: entity resolution, i.e. finding the correspondence between the rows of the datasets. It is well known that entity resolution, just like learning, is mistake-prone in the real world. Despite the importance of the problem, there has been no formal assessment of how errors in entity resolution impact learning. In this paper, we provide a thorough answer to this question, answering how optimal classifiers, empirical losses, margins and generalisation abilities are affected. While our answer spans a wide set of losses — going beyond proper, convex, or classification calibrated —, it brings simple practical arguments to upgrade entity resolution as a preprocessing step to learning. As an example, we modify a simple token-based entity resolution algorithm so that it aims at avoiding matching rows belonging to different classes, and perform experiments in the setting where entity resolution relies on noisy data, which is very relevant to real world domains. Notably, our approach covers the case where one peer does not have classes, or a noisy record of classes. Experiments display that using the class information during entity resolution can buy significant uplift for learning at little expense from the complexity standpoint.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption

Consider two data providers, each maintaining private records of different feature sets about common entities. They aim to learn a linear model jointly in a federated setting, namely, data is local and a shared model is trained from locally computed updates. In contrast with most work on distributed learning, in this scenario (i) data is split vertically, i.e. by features, (ii) only one data pr...

متن کامل

The Effect of Transitive Closure on the Calibration of Logistic Regression for Entity Resolution

This paper describes a series of experiments in using logistic regression machine learning as a method for entity resolution. From these experiments the authors concluded that when a supervised ML algorithm is trained to classify a pair of entity references as linked or not linked pair, the evaluation of the model’s performance should take into account the transitive closure of its pairwise lin...

متن کامل

Corpus based coreference resolution for Farsi text

"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...

متن کامل

The Role of Asserted Resolution in Entity Identity Information Management

This paper introduces the concept of asserted resolution as a technique for entity resolution. In asserted resolution trusted information sources are used to force the equivalence (or non-equivalence) of entity references and identity structures regardless of matching conditions. The paper proposes five specific forms of assertion to support entity identity information management, the process o...

متن کامل

A Negotiation Process Approach for Building Federated Databases

The negotiation process is often referred to in the literature on federated databases, but is seldom covered in depth. This process is essential to determine data of the component schema to be integrated for building a federated schema and the access permissions to be granted. This paper presents our negotiation process approach which is incorporated in the integration schemas mechanism, so we ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018